NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Linear Recursive Feature Machines provably recover low-rank matrices

https://doi.org/10.1073/pnas.2411325122

Radhakrishnan, Adityanarayanan; Belkin, Mikhail; Drusvyatskiy, Dmitriy (April 2025, Proceedings of the National Academy of Sciences)

A fundamental problem in machine learning is to understand how neural networks make accurate predictions, while seemingly bypassing the curse of dimensionality. A possible explanation is that common training algorithms for neural networks implicitly perform dimensionality reduction—a process called feature learning. Recent work [A. Radhakrishnan, D. Beaglehole, P. Pandit, M. Belkin,Science383, 1461–1467 (2024).] posited that the effects of feature learning can be elicited from a classical statistical estimator called the average gradient outer product (AGOP). The authors proposed Recursive Feature Machines (RFMs) as an algorithm that explicitly performs feature learning by alternating between 1) reweighting the feature vectors by the AGOP and 2) learning the prediction function in the transformed space. In this work, we develop theoretical guarantees for how RFM performs dimensionality reduction by focusing on the class of overparameterized problems arising in sparse linear regression and low-rank matrix recovery. Specifically, we show that RFM restricted to linear models (lin-RFM) reduces to a variant of the well-studied Iteratively Reweighted Least Squares (IRLS) algorithm. Furthermore, our results connect feature learning in neural networks and classical sparse recovery algorithms and shed light on how neural networks recover low rank structure from data. In addition, we provide an implementation of lin-RFM that scales to matrices with millions of missing entries. Our implementation is faster than the standard IRLS algorithms since it avoids forming singular value decompositions. It also outperforms deep linear networks for sparse linear regression and low-rank matrix completion.
more » « less
Free, publicly-accessible full text available April 1, 2026
Active Manifolds, Stratifications, and Convergence to Local Minima in Nonsmooth Optimization

https://doi.org/10.1007/s10208-025-09691-0

Davis, Damek; Drusvyatskiy, Dmitriy; Jiang, Liwei (January 2025, Foundations of Computational Mathematics)

Free, publicly-accessible full text available January 22, 2026
Asymptotic normality and optimality in nonsmooth stochastic approximation

https://doi.org/10.1214/24-AOS2401

Davis, Damek; Drusvyatskiy, Dmitriy; Jiang, Liwei (August 2024, The Annals of Statistics)

Full Text Available
Flat minima generalize for low-rank matrix recovery

https://doi.org/10.1093/imaiai/iaae009

Ding, Lijun; Drusvyatskiy, Dmitriy; Fazel, Maryam; Harchaoui, Zaid (April 2024, Information and Inference: A Journal of the IMA)

Abstract Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima—those around which the loss grows slowly—appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyse overparameterized matrix and bilinear sensing, robust principal component analysis, covariance matrix estimation and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions. For matrix completion, we establish weak recovery, although empirical evidence suggests exact recovery holds here as well. We complete the paper with synthetic experiments that illustrate our findings.
more » « less
Full Text Available
Stochastic approximation with decision-dependent distributions: asymptotic normality and optimality

Cutler, Josh; Diaz, Mateo; Drusvyatskiy, Dmitriy (January 2024, Journal of machine learning research)

Full Text Available
Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Liu, Chaoyue; Drusvyatskiy, Dmitriy; Belkin, Misha; Davis, Damek; Ma, Yian (January 2024, Advances in neural information processing systems)

Full Text Available
Multiplayer performative prediction: Learning in decision-dependent games

Narang, Adhyyan; Faulkner, Evan; Drusvyatskiy, Dmitriy; Fazel, Maryam; Ratliff, Lillian J (December 2023, Journal of Machine Learning Research)

Full Text Available
Stochastic Optimization under Distributional Drift

Cutler, Joshua; Drusvyatskiy, Dmitriy; Harchaoui, Zaid (October 2023, Journal of Machine Learning Research)

We consider the problem of minimizing a convex function that is evolving according to unknown and possibly stochastic dynamics, which may depend jointly on time and on the decision variable itself. Such problems abound in the machine learning and signal processing literature, under the names of concept drift, stochastic tracking, and performative prediction. We provide novel non-asymptotic convergence guarantees for stochastic algorithms with iterate averaging, focusing on bounds valid both in expectation and with high probability. The efficiency estimates we obtain clearly decouple the contributions of optimization error, gradient noise, and time drift. Notably, we identify a low drift-to-noise regime in which the tracking efficiency of the proximal stochastic gradient method benefits significantly from a step decay schedule. Numerical experiments illustrate our results.
more » « less
Full Text Available
Stochastic algorithms with geometric step decay converge linearly on sharp functions

https://doi.org/10.1007/s10107-023-02003-w

Davis, Damek; Drusvyatskiy, Dmitriy; Charisopoulos, Vasileios (September 2023, Mathematical Programming)

Full Text Available
Stochastic algorithms with geometric step decay converge linearly on sharp functions

Davis, Damek; Drusvyatskiy, Dmitriy; Charisopoulos, Vasileios (September 2023, Mathematical programming)

Stochastic (sub)gradient methods require step size schedule tuning to perform well in practice. Classical tuning strategies decay the step size polynomially and lead to optimal sublinear rates on (strongly) convex problems. An alternative schedule, popular in nonconvex optimization, is called geometric step decay and proceeds by halving the step size after every few epochs. In recent work, geometric step decay was shown to improve exponentially upon classical sublinear rates for the class of sharp convex functions. In this work, we ask whether geometric step decay similarly improves stochastic algorithms for the class of sharp weakly convex problems. Such losses feature in modern statistical recovery problems and lead to a new challenge not present in the convex setting: the region of convergence is local, so one must bound the probability of escape. Our main result shows that for a large class of stochastic, sharp, nonsmooth, and nonconvex problems a geometric step decay schedule endows well-known algorithms with a local linear (or nearly linear) rate of convergence to global minimizers. This guarantee applies to the stochastic projected subgradient, proximal point, and prox-linear algorithms. As an application of our main result, we analyze two statistical recovery tasks—phase retrieval and blind deconvolution—and match the best known guarantees under Gaussian measurement models and establish new guarantees under heavy-tailed distributions.
more » « less
Full Text Available

« Prev Next »

Search for: All records